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Abstract 


We present a simple, efficient, and self-contained construction of a wait-free regular register 
from Byzantine storage components. Our construction utilizes a novel building block, called 
1-regular register, which can be implemented from Byzantine fault-prone components with the 
same round complexity as a safe register, and with only a slight increase in storage space. 
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1 Introduction 


In this paper we consider a problem of constructing wait-free distributed storage from Byzantine 
fault prone components in asynchronous settings. Our specific objective is to devise a solution that 
will be simple, efficient, and feasible in practice, yet providing meaningful semantics for higher level 
applications. Roughly speaking, a wait-free object is one that always guarantees the liveness of 
shared memory operations, including in the presence of any number of process (client) failures. 

Constructing efficient storage solutions from Byzantine components received considerable atten- 
tion recently, as such solutions are useful in a number of emerging application domains. Originally, 
such solutions have been introduced in the context of scalable client-server systems. Such systems 
achieve more scalability than traditional state-machine replication approaches by removing the di- 
rect communication among the servers, and thus reducing the load that each client request (or 
transaction) imposes on the servers. This approach was pioneered in the Fleet [17] system, and 
adopted in many others, e.g., SBQ-L [18], Agile Store [11], Coca [28], and [5]. Similar solutions have 
also been employed in the setting of Storage Area Networks (SANs). SAN technology allows clients 
to access disks directly over the network so that the file server bottleneck is eliminated. Examples 
of SAN-based systems that use disks for information sharing and coordination include Compaq’s 
Petal [13] and Frangipani [23], Disk Paxos [7], Active Disk Paxos [6], and Byzantine Disk Paxos [1]. 
More recently, solutions of this nature have been adopted in peer-to-peer systems, which consist of 
a collection of widely spread nodes storing data objects. Naturally, due to their Internet-wide de- 
ployment, the storage nodes are prone to malicious attacks, which motivates adopting a Byzantine 
failure model for the storage nodes. Examples of peer-to-peer systems that adopt storage-centric 
replication to support availability in face of Byzantine failures include Rosebud [21] and [14]. 

Fault-prone storage systems as mentioned above can be formally modeled as an asynchronous 
shared memory system where a threshold t of the memory objects may fail by being non-responsive [3, 
10] or by returning arbitrary values [2, 10] (ie., by being Byzantine); this failure model is called 
non-responsive arbitrary (NR-Arbitrary) faults [10]. In this paper, we assume that less than one 
fourth of the memory objects can fail. In [1] we show that this assumption is necessary for achieving 
wait-free implementations as efficient as those presented in this paper; that is, every construction 
that uses less than 4t + 1 objects must have a higher latency. 

All existing wait-free Byzantine-resilient storage constructions provide safe register seman- 
tics! [10, 16, 1]. The only previous direct constructions of objects with stronger (regular or atomic) 
semantics from Byzantine storage satisfy weaker (non-wait-free) termination conditions [18, 1]. In 
this paper we focus on wait-free constructions. Safe register semantics, by themselves, are too weak 
to be directly useful for applications. The focus on these semantics has been justified by the exis- 
tence of known reductions from wait-free safe registers to stronger ones [19, 27, 8, 12, 24, 25, 26, 9]. 
However, this approach results in constructions that are not self-contained, and as we argue below, 
are not tailored to the requirements of a distributed storage system. In this paper, we do not 
use these reductions as black boxes, but instead capitalize on their techniques in order to derive 
a new self-contained wait-free regular register construction that is simple, efficient, and feasible in 
distributed storage environments. 

Most existing constructions of strong memory objects are fairly elaborate. We believe that a 


'A safe register guarantees that every read operation that does not overlap any write returns the latest written 
value, or the initial value if no value was written; the result of a read operation that does overlap a write operation 
may be arbitrary. 


major reason for this complexity is the fact that they aim to achieve an atomic register. However, 
recent studies indicate that storage with regular semantics is sufficient in most cases [7, 22, 1]. A 
regular register is weaker than an atomic one; roughly speaking, a regular register guarantees that 
every read operation returns the value that was written by a write operation invoked not earlier 
than the last write operation that returns before the read is invoked, or the initial value if no value 
is written before the read. We therefore focus on constructing regular registers in this paper. 

Existing constructions of strong (regular or atomic) wait-free objects from weaker ones were not 
designed with distributed storage in mind. In particular, such constructions have typically focused 
on bounding the memory size rather than reducing the number of shared memory accesses. In a 
distributed setting, however, every memory access incurs a latency of two message delays, whereas 
storage space is typically abundant. Therefore, we believe that a practical construction for the 
model we consider herein should focus on simplicity and reducing communication costs, even at 
the cost of using unbounded counters. This is precisely the approach we take in this paper. We 
give an algorithm that uses unbounded counters, (which is acceptable in practice), achieves better 
latency than all existing regular register implementations, and is very simple to understand and 
implement. 

Our approach further differs from existing constructions in the basic building blocks employed. 
Traditional shared memory constructions often use a collection of safe single-bit registers, which 
are mathematical models of flip-flops. In contrast, in a distributed storage setting, we can assume 
that each storage unit (representing a disk or a server) can support stronger objects. In this paper 
we introduce a novel intermediate building block, called 1-regular register. We show that a 1- 
regular register can be implemented from Byzantine fault-prone components with the same round 
complexity as a safe register, and with only a slight increase in storage space. We then give a simple 
and efficient implementation of a wait-free regular register using 1-regular ones. 


Outline: The rest of this paper is organized as follows: In Section 3, we describe the formal 
computation model used throughout the paper and give the definitions of various register types. 
In Section 4, we construct a wait-free l-regular register from n > 4t base registers up to t of which 
can incur Byzantine faults. Finally, in Section 5, we show how to use 1-regular registers in order to 
construct a wait-free regular register. We note that all but one of the 1-regular registers employed in 
this construction can be replaced with (weaker) safe ones. Therefore, for completeness, we present 
a direct safe register construction in the appendix. As noted above, using safe registers instead of 
1-regular ones does not reduce the latency, but it slightly reduces the space requirements. 


2 Related Work 


Wait-free shared register constructions have been an actively researched area for several decades [19, 
27, 8, 12, 24, 25, 26, 9]. Most of the constructions found in the literature aim to implement 
atomic registers from safe bits. One notable exception is the construction by Lamport in [12], 
which implements an n-valued regular register using O(n) safe bits. The construction of [27] was 
purported to be an atomic register construction, but in fact, only provides regular semantics. 
Peterson [19] and Tromp [24] present constructions of atomic registers from safe bit tracks 
and atomic control bits. It appears that these constructions can be easily adapted to implement 
regular registers by replacing the atomic control bits with regular ones, which in turn can be easily 
obtained from safe bits as shown in [12]; nevertheless, this was neither claimed nor proven in those 


papers. Although both of these constructions have logarithmic space complexity, the number of 
shared memory accesses they employ is rather high. Therefore, they are not directly applicable in 
a distributed setting. Nonetheless, our work benefits from several techniques and ideas underlying 
these constructions, e.g., using separate tracks to store copies of the register value, and using a 
handshake mechanism to coordinate between the reader and the writer. 

All the existing wait-free Byzantine-fault-tolerant register constructions for distributed storage 
setting only provide safe semantics [16, 10, 1]. Other constructions achieve stronger semantics 
at the cost of weaker termination guarantees: the protocols by [18] and [4] implement atomic and 
regular registers, respectively, but do not guarantee termination in the face of client crashes; and the 
algorithm of [1] implements a regular register where the read operation is guaranteed to terminate 
only if it eventually runs in isolation for sufficiently long. 

Our 1-regular register notion is a generalization of the pseudo-regular register of [20]. Whereas 
a read operation on our 1-regular register can return | if it is concurrent with more than one write, 
a read from the pseudo-regular register is allowed to return L if it is concurrent with any number 
of writes. Hence, the guarantees it provides corresponds to 0-regularity in our terminology. 


3 The System Model 


We consider asynchronous shared memory systems consisting of a collection of processes interacting 
with a finite collection of objects. Objects and processes are modeled as I/O automata [15]; for 
space constraints, we do not repeat the details of the I/O model here. 

An object automaton’s interface is determined by its type, which is a tuple consisting of the 
following components: (1) a set Vals of values; (2) a set of invocations; (3) a set of responses; and 
(4) a sequential specification, which is a function from invocations x Vals to responses x Vals. 
In a shared memory system consisting of processes P;, P2,..., an object of type T interacts with 
a process P; by means of input actions of the form a;, where a is an invocation of JT, and output 
actions of the form 6;, where 6b is a response of J. An object’s external behavior is specified in 
terms of the properties of its traces (i.e., executions consisting of external actions only). Liveness 
properties are required to hold only in fair executions, i.e., executions where each output and 
internal action has infinitely many opportunities to occur. 

The interaction between a process and an object is well-formed if it consists of alternating 
invocations and responses, starting from an invocation. We only consider systems where interactions 
between processes and objects are well-formed. Well-formedness allows an invocation occurring in 
an execution a to be paired with a unique response (when such exist). If an invocation has a 
response in a, the invocation is complete; otherwise, it is incomplete. Note that well-formedness 
does not rule out concurrent operation invocations on the same object by different processes. Nor 
does it rule out parallel invocations by the same process on different objects, which can be performed 
in separate threads of control. 

A threshold t of the objects may suffer NR-Arbitrary failures [10], i-e., may fail to respond to 
an invocation, or may respond with an arbitrary value. Any number of the processes may fail by 


stopping. 


3.1 Registers 


A read/write register (or simply, register) type supports an arbitrary set Vals of values with an 
arbitrary initial value vp € Vals. Its invocations are read and write(v), v € Vals. Its responses 
are v € Vals and ack. Its sequential specification, f, requires that every write overwrites the last 
value written and returns ack (i.e., f(write(v), w) = (ack, w)); and every read returns the last value 
written (i.e., f(read,v) = (v,v)). In a shared memory system consisting of processes P;, P2,..., a 
process P; interacts with a shared register by means of input actions of the form read; and write(v);, 
and output actions of the form v; and ack;. A read/write register is called k-reader/m-writer if 
only k (m) processes are allowed to read (resp. write) the register. We use the term multi-reader 
when the particular number of readers is not important. 

We now define several register properties that will be used throughout the paper. Fix x to be 
a single-writer/multi-reader (SWMR) or single-writer/single-reader (SWSR) register, and let o be 
a sequence of invocations and responses of «x. 


Safe register. o is safe [12] if every complete read operation that does not overlap any write 
operation returns the register’s value when read was invoked (i.e., the latest written value or 
the initial value vo if no value was written). A register is called safe if it has only safe traces. 


Regular register. o is regular [12] if it is safe, and in addition, a read operation that does overlap 
some write operations returns either one of the values written by overlapping writes or the 
register’s value before the first overlapping write is invoked. A register is regular if it has only 
regular traces. 


1-regular register. o is regular if it is safe, and in addition, a read operation that overlaps at most 
one write operation returns either the value written by overlapping write or the register’s value 
before the overlapping write is invoked. Otherwise, a read operation may return in addition 
a special | value. A register is 1-regular if it has only 1-regular traces. 


Wait Freedom. Register x is wait-free if in any fair execution of any shared memory system that 
includes x, every invocation of x by a correct process is complete. 


4 Wait-free 1-Regular Register Construction 


The implementation of a wait-free SWMR 1-regular register from n > 4t wait-free SWMR regular 
registers up to t of which can incur NR-arbitrary failures is depicted in Figure 1. The notation 
INVOKE write(x;,v), (respectively, INVOKE tmp — read(x;,)) means that a new thread is started 
that performs a write on register x; with value v (respectively, a read of register x; whose response 
will be stored in local variable tmp). The notation x; RESPONDED means that the last thread 
created by an INVOKE operation has completed its execution on register x;. Note that this is well 
defined because we maintain well formedness using control flags pending and enabled in such a 
manner that there is at most one pending thread for each register. Each of the n base registers x; 
consists of a cyclic two-value buffer whose elements are denoted 2;[0] and 2;[1] respectively. Each 
WRITE operation WRITE(v) first chooses a unique monotonically increasing timestamp ts and then 
writes the pair (ts, v) to the base registers x; provided that the kth WRITE operation updates the k 
mod 2th part of z;. This mechanism ensures that every READ operation that is overlapped by at 


most one WRITE operation will always be able to recover a good value (i.e., the value that upholds 
regularity) from the values read from either part of the registers as explained below. 

The READ implementation reads the values from at least n—t base registers and stores the values 
read from the x;[0] and 2;[1] components in the arrays w0[1...n] and wl{1...n] respectively. The 
predicate safe(c, w), where c € TSVals (i.e., a timestamp-value pair) and w is a vector of TSvals, 
evaluates to true if c appears in at least t+ 1 elements of w. A value c which is safe in either w0 or 
wl is known to be returned by at least one correct register x; and therefore, was written by some 
previously invoked WRITE. However, the safe predicate by itself is insufficient to ensure regularity 
since some old values could still be returned by t + 1 registers (e.g., if there are t registers which 
are not up to date and 1 which is faulty). Hence, in order for a safe value c to be returned, we 
must ensure that all values with a timestamp higher than c.ts are invalid, i.e., could not have been 
written by a later complete WRITE operation. 

The validity check is accomplished by the predicate invalid(c, w), which returns true if and only 
if for at least 2+ 1 elements wlj], wly].ts < c.ts or wlj].ts = ts A wly].val # c.val. Since every 
completely written value can only be overwritten by a higher timestamped value, a value c is invalid 
in w0 (resp. w1) if and only if it was either not written to at least n — t components 2;[0] (resp. 
x;|1]), or was not written at all. Hence, every value c which is both safe in w0 (resp. wl) and 
for which all the higher timestamped values are invalid, is known to be not older than the value 
written by the last WRITE operation that writes x,[0] (resp. 2x;[1]) and completes before READ is 
invoked. Therefore, the highest timestamped such value is guaranteed to preserve regularity. 

The informal discussion above is formalized in the following two lemmas: 


Lemma 1. Jf READ returns v # L, then v is either a value that was written by an overlapping 
WRITE, or the register’s value before the first overlapping WRITE was invoked. 


Proof. Let R be a READ operation that completes. Let v be the value written by the last WRITE 
operation W that completes before R is invoked. By the WRITE implementation, there exists a 
timestamp ts such that (ts,v) is written to either x,[0] or x;[1] component of at least n —t base 
registers x;. Hence, w.l.o.g. we can assume that (ts, v) was written to either one of two components, 
say to x;|[0]. Since READ returns a value 4 L, by the READ code, COUC1 # 0. There are two cases 
to consider: 

First, suppose that CO = @ and Cl # @. By line 5, this implies that either (1) -safe(c, w0) for 
all c € TSVals, or (2) for every c € TSVals such that safe(c, w0) holds, higherValid(c, w0) is also 
true. In both these cases, READ overlaps at least one complete WRITE operation that writes (ts’, v’) 
with ts’ > ts to the 2;[1] component of the base registers x;. Hence, there are at most 2t base 
registers x; such that x;[1].ts < ts or a;{1].ts = ts A a;[1].v 4 v. Thus, for any c € TSVals such 
that c’.ts < ts, higherValid(c’, w1) is true. Therefore, for each cl € C1, cl was returned by at least 
one correct register (safe(cl)), and cl.ts > ts. Hence, cl.val was written by a WRITE operation 
that follows W. Since C1 # 0, the return value val is equal to cl.val for some cl € Cl. Hence, the 
regularity is maintained. 

Finally, suppose that CO 4 @. In this case, there are at most 2t base registers x; such that 
x;[0|.ts < ts or 2,[0].ts = ts A x,{[0|.v 4 v. Thus, for any c’ € TSVals such that c’.ts < ts, 
higherValid(c’, w0) is true. Since CO 4 @, then for each cO0 € CO, cO was returned by at least one 
correct register (safe(c0)), and c0.ts > ts. Furthermore, by line 8, the value returned must have 
been written with a timestamp which is at least as high as cO0.ts. Hence, the return value upholds 
regularity. O 


Types: TSVals = TS x Vals, with selectors ts, val; 
Shared objects: regular registers x; € TSVals x TSVals, 1 < i < n; whose 
components are addressed by «;[0] and x,(1]; initially 7; = ((ts0, vo), (to, Vo)); 


WRITE Emulation 


Local variables: 
enabled[1...n], pending[1...n] € Boolean 
initially Vi enabled|i] = pending|i| = false; 
wl2] € TSVals, initially w[0] = w[1] = (ts, vo); 
turn € {0,1}, initially 0; 
ts ETS; 


WRITE(v): 
choose ts € T'S larger than previously used; 
wlturn] — (ts,v); 
for 1 <i <n, enabled|i] — true; 
repeat 
CHECK; 
until |{i : senabled|i] \ pending|i]}| > n — t; 
turn — aturn; 
return ack; 


CHECK: 

if (St : enabled|i] \ ~pending|i|) then 
(enabled|i], pending|i]) — (false, true); 
INVOKE write(x;, (w[0], w[1])); 
if (di : x; RESPONDED) then 

pending|i] — false; 


—m 


READ Emulation 


Local variables: 


enabled{1...n], pending[1...n], old[1...n] € Boolean 
initially Vi enabled[i] = pending|i] = false; 
wO[l...nj,wl[l...n],tmp0[1...n],tmp1[1...n] € TSVals; 


Predicate and macro definitions: 


invalid((ts, v), w) = |{¢: w[é] = (ts’, v’)A 
(ts’ <tsV (ts'’ =tsAv#v’'))}| > 2t4+1; 
safe(c, w) * |{i: wi] = c}] >t +1; 
higherValid(c, w) = Ji: w[i] =c Ac’.ts > c.ts A winvalid(c’, w); 


READ: 


Ty 


for 1 <i<n, if(pending{i]) then old[i] — true; 


2 for 1<i<n, enabled|i] <— true; 
3 for 1<i<n: wO[i] — L, wlfi] — L; 
4. repeat 
CHECK; 

until |{i : senabled[i] \ spending|[i|}| > n —t; 
5s CO<— {c0 € TSVals : safe(c0, w0) A shigherValid(c0, w0)}; 
6 Cl <— {cl € TSVals : safe(cl, w1) A shigherValid(cl, w1)}; 
7 if (COUC1 49) then 
8: return c.val: c.ts = max{c'.ts:c! € COUCT}; 
9; return 1; 
CHECK 


if (si : enabled|i] A apending|i]) then 
(enabled|i], pending|i]) — (false, true); 
INVOKE (tmp0[], tmp1|[i]) — read(x;); 
if (di : x; RESPONDED) then 
if (sold{i]) then 
(w0fi], wl [s]} — (émp0[i], empl i) 
pending|i] — false; old[i] — false; 


Figure 1: The 1-regular register emulation. 


Lemma 2. Jf R = READ returns L, then R is concurrent with at least two WRITEoperations. 


Proof. Let W = wRITE(v) be the last WRITE operation that completes before R is invoked, and ts 
be the timestamp used to write v to the base registers. Suppose w.l.o.g. that (ts,v) was written 
to the x;[0] component of the base registers x;. Assume by contradiction that READ overlaps at 
most one WRITE operation W’ = WRITE(v’) and let ts’ be the timestamp used to write v’ to the 
base registers. Since the base registers updated by W and W’ intersect the base registers read by 
R by at least t + 1 correct registers, either safe((ts,v),w0) or safe((ts’,v’),w1) (or both) are true. 
Moreover, there are at most 2t registers x; such that 2;[0].ts < ts V (x;[0].ts = ts A x;[0].val 4 v) 
or 2;,{l].ts < ts’ V (a,{l].ts = ts’ A x,[l]z.val 4 v’). Hence, either shigherValid((ts,v),w0) or 


shigherValid((ts’,v’),w1) is true. Therefore, COUC1 4 @ so that L cannot be returned. A 
contradiction. oO 


The two lemmas above imply the following 
Lemma 3 (1-Regularity). All the traces of the algorithm in Figure 1 are 1-regular. 


Finally, both WRITE and READ are wait-free because neither one ever awaits more than n — t 
replies and at least n — t base registers are responsive. We proved the following 


Theorem 1. The algorithm in Figure 1 is an implementation of a wait-free 1-reqular register from 
n > 4t regular base registers at most t of which can incur NR-arbitrary faults. 


5 Wait-free Regular Register Construction 


We now present a regular register construction from 1-regular and safe ones. We note that only one 
of the registers in this construction needs to be 1-regular; it suffices for the remaining regulars to 
satisfy the weaker safe semantics. For completeness, we show in Appendix A a direct construction 
of a wait-free safe register from n > 4t regular base registers up to t of which can be Byzantine 
faulty, using a technique similar to that of [16]. 

The algorithm in Figure 2 uses one wait-free 1-writer/m-reader 1-regular registers, m 1-reader /1- 
writer safe registers writeable by the writer, and m 1-reader/1-writer safe registers writeable by 
the reader to construct a wait-free 1-writer/m-reader regular register. 

The read and write operations to the shared safe and 1-regular registers are denoted read and 
write. We construct WRITE and READ operations that maintain regularity. 

The two writer’s registers are called P and B for primary and backup respectively. These 
registers are used as buffers (or tracks) to store the values written by the writer. Each reader 
process i, 1 < i < m uses a 1-writer/1-reader multi-valued safe register RL; to signal the writer 
about a possible concurrent read in progress. 

The construction works as follows: The writer starts by writing the primary register P (line 
1). It then examines the reader registers RL; to see whether the value of some of these registers 
has changed since the last time it was read by the writer. If so, it saves the last written value 
in the backup registers B; for every register RL; where a change was observed (lines 4-6) before 
returning ack (line 7). 

The reader 7 starts by writing the register RL; (line 2) and then proceeds to read the primary 
track P (line 3). We consider the following two cases: 


1. The read of the primary track by the reader (line 3) returns a value 4 1. In this case, 
l1-regularity of P ensures that the return value preserves regularity; 


2. The read of the primary track by the reader (line 3) returns 1. In this case, 1-regularity 
implies that read(P) by 7 is concurrent with at least two writes to P by the writer. Therefore, 
the writer’s code is executed in whole at least once since read(P) has been invoked. This 
implies that the writer observes the change in the value of the register RLD;, and therefore, 
writes the 7’s backup track exactly one time before read(P) completes. Therefore, when reader 
i eventually returns from read(P), it finds B; already written, and therefore, returns a correct 
value. 


Shared objects: 
1-writer/1-reader safe registers RL; € Integers, writeable by the reader i, 
1 <i<™m, and readable by the writer; initially 0; 
1-writer/m-reader 1-regular register P € Vals U L writeable by the writer 
and readable by the readers; initially vo; 
1-writer/1-reader safe register B; € Vals writeable by the writer and read- 
able by the reader 7; initially vg; 


Local to the writer: Local to the reader 1, 1 <7 <m: 
Static wi[1...m] € Integers Static rl; € Integers, initially 0; 
initially Vi wilt] = 0; z€ValsU{1}; 


a; € Integers, for 1 <<i<m; 


WRITE(v): READ;, 1 <i <m: 
1  write(P,v); z rl rl, +1; 
2, a, — read(RL;), for each i, 1<i<m; 2 write(RL;,rl;); 
3 if (St: a; A wil[i]) then 3 2+ read(P); 
4: for each i, a; 4 wi[t]: a if (c= 1) then 
5: w|i] — aj; 5: x — read(B;); 
6: write(B;,v); 6 return 7; 


7 return ack; 


Figure 2: The 1-writer/m-reader Wait-Free Regular Register Construction. 


We observe that all the writes to the backup registers (line 6) can be executed in parallel. 
Hence in a distributed implementation, updating all B;’s would incur only the single round-trip 
message latency. This could be optimized even further by combining the writes targeted to the 
same destinations within a single message. 


R. 
| Ri l l 
| | 
R;.write(RL;) 4-._——~‘Ry.read(P) 1 R,.read( Bi) 
a i ~~ . 
Wi l \Wa a | Ws l 


W\.write(P) 
W2.read(RL;) 


Figure 3: Several WRITE operations overlapping READ. 


The above intuition is formalized by the following lemma: 


Lemma 4. Let READ overlap one or more WRITE operations. Then READ returns one of the values 
written in an overlapping WRITE, or the value written in the latest WRITE operation preceding the 
READ. 


Proof. (We refer the reader to Figure 3 for intuition on this proof.) 


Let R; be a READ operation by process i. Let R; consist of a write R;.write(RL;) to RL;; a 
read R;.read(P) of P; and potentially a read R;.read(B;) of B;. 

For any WRITE operation W, we refer to specific operations within WRITE as follows. We denote 
by W.write(P) the first write operation (to P). We denote by W.read(RL;) the read from Rj, 
W.a; the value returned from W.read(RL;), and W.wl the value of wil at the beginning of W. We 
let W.write(B;) denote the write to Bj, if exists. 

Let W, denote the latest WRITE such that W).write(P) terminates before R;.Write(RL;) com- 
pletes. Let the two WRITE operations succeeding W; be denoted W2 and Ws, respectively. 

By choice of W,.write(P), W2.read(RL;) strictly succeeds R;.write(RL;) (see Figure 2). Hence, 
if no WRITE operation before W2 sees the value written in R;.write(RL;), then W2 must have 
Wo2.wl #4 Wo.a;, and W2.write(B;) exists. Otherwise, some WRITE preceding W already sees 
R;.write(RL;) and writes B;. In any case, at the latest when W2 completes, a write to B; completes. 
Moreover, B; is written at most once during R;.write(RL;). The value stored in B; in this write is 
the most up-to-date value written in a WRITE preceding or concurrent with R;. 

Now there are two possible cases. The first case is when R;.read(P) returns a value other than 
1. Then by 1-regularity, it returns a good value from P (i.e., it returns a non-1 value of an 
overlapping WRITE or the latest WRITE preceding the read). 

The second case is when R;.read(P) returns L. Then by 1-regularity, R;.read(P) overlaps 
two WRITE operations. Since Wj strictly precedes R;.read(P), we have that W2 and W3 overlap 
R;.read(P). Hence, W2 completes before R;.read(P) terminates. As we show above, at the latest, 
W2 is a WRITE that sees R;.write(RL;) and updates B;. Since R;.write(RL;) completes before 
R;.read(B;) starts, and because no new write to B; is invoked until R;.read(B;) returns, by safety 
of B;, R;.read(B;) returns a non- value that upholds regularity. O 


Finally, if a READ operation R; by a process 7 is not concurrent with any WRITE, then by 1- 
regularity of P, if there exists a WRITE operation preceding R;, then R; returns the value written 
to P by the latest such WRITE. Otherwise, R; returns vo, the initial value of the register. Also, 
the algorithm is obviously wait-free since in a fair execution, it could never block forever in any 
statement of the pseudocode. Hence, we proved the following: 


Theorem 2. The algorithm in Figure 2 is an implementation of a 1-writer/m-reader wait-free 
regular register from one 1-writer/m-reader wait-free 1-regular registers and 2m 1-writer/1-reader 
safe registers. 


6 Conclusions 


We have presented a simple, efficient, and self-contained construction of a wait-free regular reg- 
ister from Byzantine components. This yields a practical building block for distributed storage 
applications tolerating Byzantine faults. 
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A Wait-Free Safe Register Construction 


In this section we present a wait-free SWMR safe register construction from n > 4¢ regular registers 
up to t of which can be Byzantine faulty (see Figure 4). The implementation is based on techniques 
similar to those of [16]. 

We now prove that the algorithm in Figure 4 is a safe register construction: 


Lemma 5 (Safety). All traces of the algorithm in Figure 4 are safe. 


Proof. Let R be a READ operation that returns, and W = WRITE(v) be a WRITE operation that 
returns before R is invoked, and assume that for no W’ operation that follows W, W’ is invoked 
before R returns. We prove that R returns v, thereby upholding safety. Indeed, by the WRITE 
code, write((ts,v)) terminates on at least n — t > 3t + 1 base registers before R is invoked. Since 
READ awaits responses from at least n — t registers, by regularity of x;, there are at least t+ 1 
correct registers that respond with (ts,v) to the base object reads issued during R. Therefore, 
upon completion of line 4, safe((ts,v)) is true. Thus, (ts,v) € C after line 5. Finally, since no 
WRITE operation that follows W is invoked before R returns, there might be at most t (faulty) base 
registers that return (ts’,v’) with ts’ > ts or ts’ = ts \v' 4 v. Hence, (ts,v) is the only highest 
timestamped value in C' which is safe. By line 8, v must be the R’s return value in this case. O 


Finally, the construction is trivially wait-free since neither WRITE nor READ are ever awaiting 
more than n —¢ responses and at least n — t registers are correct. Hence, we proved the following: 


Theorem 3. The algorithm in Figure 4 is an implementation of aSWMR safe register from n > 4t 
base regular register up to t of which can be Byzantine faulty. 
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Shared objects 
SWMR regular registers x; € TSVals, 1 <i <n; initially x; = (tso, vo); 


WRITE Emulation READ Emulation 
Local variables: 
Local variables: enabled[1...n], pending[1...n], old[1...n] € Boolean 
enabled{1...n|, pending[1...n] € Boolean initially Vi enabled[i] = pending|i| = false; 
initially Vi enabled|i] = pending|i] = false; wil...nj,tmp[l...n] € TSVals; 


weéeTSVals; ts € TS; 
Predicate and macro definitions: 
safe(c) © |{7: w[i] =c}| >t+1; 


WRITE(v): READ: 
choose ts € T'S larger than previously used; L: for 1 <i<n, if(pending/i]) then old[i] — true; 
w <— (ts, v); 2: for 1 <i<n, enabled{i] — true; 
for 1 <i<n, enabled|i] — true; 3: for 1<i<n: wii] <— 1; 
repeat 4: repeat 
CHECK; CHECK; 
until |{i : senabled|i] \ spending|i]}| > n — t; until |{i : senabled[i] \ apending|[i]}| > n —t; 
return ack; 5: C — {ce TSVals : safe(c, w)}; 
6 if (C AQ) then 
CHECK: 7: return c.val: c.ts = max{c’.ts:c € Ch; 
if (di : enabled[i] A ~pending|i]) then 8: return vo; 


(enabled|i], pending|i]) — (false, true); 
INVOKE write(a;, w); CHECK: 
if (Si : x; RESPONDED) then if (di : enabled|i] A spending|i]) then 

pending|i] — false; (enabled|i], pending|i]) — (false, true); 
INVOKE tmp|i] — read(x;); 
if (di : x; RESPONDED) then 

if (sold{i]) then 

wli] — empl 
pending|i] — false; old|i] — false; 


Figure 4: The SWMR safe register emulation. 
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